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ABSTRACT 

A DNA copy of the mRNA that encodes the nucleocapsid protein of Mouse 
Hepatitis Virus JHM has been cloned into pAT153. The DNA copy specifically 
inhibited the synthesis in vitro of the nucleocapsid protein. The cDNA was 
subcloned into M13 vectors and the entire sequence, 1767 bases including a 15 
base terminal poly (A) tract, has been determined by chain-terminator se- 
quencing. The sequence contained an open-reading frame that could encode a 
basic protein of mol.wt. 49700. From the predicted sequence it was apparent 
that the nucleocapsid protein has 5 basic regions, two of which are located 
near the middle of the sequence, a_ serine-rich region was also located, a 
feature which may be of functional importance as the nucleocapsid protein is 
phosphorylated at serine residues. The carboxy terminus of the nucleocapsid 
protein was found to be acidic. The 5' non-coding sequence contained a triple 
repeat of the pentamer AATCT, a structural feature which may play a signifi- 
cant role during the production of subgenomic viral mRNAs. 


INTRODUCTION 

Coronaviruses are enveloped, positive-stranded RNA viruses which infect 
vertebrates and are responsible for diseases of clinical and economic impor- 
tance, in particular respiratory and gastro-intestinal disorders?. The 
most-studied member of the group is Murine Hepatitis Virus (MHV). Depending 
upon the age and genetic background of the host, different MHV strains 
display differences in virulence, pathogenicity and in organ and cell tro- 
pisms. They therefore provide useful models for a variety of disease pro- 
cesses. Of particular interest is the neurotropic MHV strain, JHM, which has 
the ability to induce acute and subacute disorders of the central nervous 
system in rodents and has been used as a model for virus-induced demyeli- 
nation?”4, 

The Mouse Hepatitis Virus (MHV) genome is a linear, unsegmented, infec- 
tious RNA which is about 18 kilobases long. MHV replicates in the cytoplasm 
of infected cells and the viral genetic information is expressed as one 
genome-sized and six subgenomic mRNAs. The largest (genome-sized) mRNA is 
termed mRNA 1 and the smaller mRNAs are numbered in order of decreasing size. 
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These mRNAs are synthesized in non-equimoltar amounts but in constant propor- 
tions. For a review of coronavirus replication, see ref. 1. Analysis by RNase 
Ty) oligonucleotide fingerprinting and by hybridization to virus specific 
cDNA probes?” 10 reveals that these mRNAs are 3'coterminal forming a 
"nested-set". Furthermore, analysis of RNase TY oligonucleotides from sub- 
genomic mRNAs and from the equivalent regions of the genome suggests that the 
subgenomic mRNAs bear a leader sequence derived from the 5' end of the 


genome’ *® The translation of MHV mRNAs in vivo, in cell-free systems or in 


oocytesi}-13 


» shows that each mRNA is translated independently to produce a 
single protein, the size of which corresponds to the coding capacity of the 
5' sequences not found in the next smallest mRNA. The smallest and most 
abundant mRNA (mRNA7) encodes the virion nucleocapsid protein, a basic, 
phosphorylated polypeptide of 50-60000 mol.wt. The next most abundant RNA, 
mRNA6, encodes the virion matrix glycoprotein(s) (23000 to 25000 mol.wt.) 
and the third major intracellular RNA, mRNA3, encodes the virion peplomer 
glycoprotein(s) (90000 and 180000 mol.wt.). The remaining intracellular 
mRNAS, which are found in lesser amounts, are thought to encode viral 
non-structural proteins. 

In the experiments described here, we have isolated DNA copies of 
MHV-JHM mRNA7 and obtained a complete sequence of the mRNA. This nucleotide 
sequence and the predicted sequence of the encoded protein can be compared 
with the equivalent sequences recently obtained for the hepatotropic MHV 
strain, A59!4, 


MATERIALS AND METHODS 
Chemicals 

Avian myeloblastosis virus reverse transcriptase was supplied by Life 
Sciences (St. Petersburg, Florida). Radiochemicals were supplied by Amersham 
Buchler (Braunschweig, F.R.G.). M13 pentadecamer sequencing and hybridi- 
zation probes, oligo dT 1918 and oligo dGio_1g were obtained from PL Bio- 
chemicals (St. Goar, F.R.G.). Escherichia coli DNA polymerase I (large frag- 
ment), S1 nuclease and terminal deoxynucleotidyl transferase were obtained 
from Bethesda Research Laboratories (Neu-Isenburg, F.R.G). New England 
Nuclear (Dreieich, F.R.G.) supplied T4 DNA ligase. Restriction enzymes were 
obtained from PL Biochemicals, Bethesda Research Laboratories and Boehringer 
Mannheim (Mannheim, F.R.G.). 


Synthesis and Cloning of cDNA 
Polyadenylated RNA was isolated from Sac(-) cells that had been infect- 
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ed with MHV-JHM, as previously described!”. Double-stranded cDNA was pre- 
pared from 10 /49 of polyadenylated RNA, using oligo dT and oligo dG to prime 
first and second strand synthesis, respectively, then oligo dC-tailed, 
double-stranded cDNA was annealed to PstI-cleaved, oligo dG-tailed pAT153, 
according to protocols described by Land et a/6, Escherichia coli HB101 was 
transformed with the annealed plasmid/cDNA using the method of Dagert and 
Ehrlich”, Tetracycline-resistant, ampicillin-sensitive bacterial clones, 
from two independent cDNA cloning experiments, were screened for JHM-speci- 
fic sequences by hybridization with a single-stranded cDNA probe, containing 
32p | that had been copied from genome RNA isolated from purified virions. 
Characterization of cloned cDNA 

The size of inserts in plasmids from strongly-hybridizing clones was 
determined by gel electrophoresis of DNA extracted by the method of Holmes 
and Quigley!8, Plasmids containing the largest inserts were prepared from 1 
litre cultures and were purified by equilibrium centrifugation in ethidium 
bromide/caesium chloride. Inserts were excised from the plasmids using PstI 
and were recovered from agarose gels by electroelution. The inserts were 
mapped by partial digestion with restriction enzymes of DNA that had been 
labelled using 32p cordycepin triphosphate and terminal deoxynucleotidy] 
transferase. 
Nucleotide sequencing 

Fragments of two cDNA inserts were generated by a variety of restriction 
enzymes and cloned into the M13 vectors mp 8 and mp 919 The fragments were 
sequenced using the chain terminator method of Sanger et 1" 77 % of the 
cDNA was sequenced on both strands, a further 14 % on different but over- 
lapping fragments of the same strand and the remainder was sequenced at least 
twice. Towards the end of the sequencing project specific clones were identi- 
fied by their hybridization to a panel of characterized M13 clones. M13 
hybridization probes were prepared by the method of Hu and Messing. 
Translation in vitro, Hybrid-arrested translation and polyacrylamide gel 
electrophoresis 

Polyadenylated RNA from cells infected with MHV-JHM was translated in a 
rabbit reticulocyte lysate as previously described!>, Hybrid-arrested trans- 
lation experiments, using purified cDNA insert and polyadenylated RNA from 
cells infected with MHV-JHM, were performed according to the method of Pater- 


son and kuff2*, using a rabbit reticulocyte lysate. Translation products 


were analysed on 15 % polyacrylamide gels??, 
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Fig. 1 

Hybrid-arrested trans lation of MHV-JHM mRNA7 

Autorad{ograph of the ~~S methionine-labelled products synthesized in a rab- 
bit reticulocyte lysate and separated on a 15 % polyacrylamide-SDS gel. The 
samples translated were: (a) no added RNA, (b-f) 250 ng of polyadenylated, 
cytoplasmic RNA from cells infected with MHV-JHM, and either 500 ng of the 
insert DNA (c and d) or 1000 ng of the insert DNA (e and f). Samples (c) and 
(e) were in the hybrid conformation and samples (d) and (f) were heated to 
melt the hybrids. The major products of 60000 and 23000 mol.wt. have been 
identified as the nucleocapsid and matrix proteins respectively. Sample (m) 
contained ~ ‘C-labelled molecular weight markers (CFA626, Amersham Buchler, 
Braunschweig, F.R.G.). , 


RESULTS 
Identification of cDNA clones as copies of mRNA7 
The largest JHM-specific cDNA clones from each of two cloning experi- 
ments (those within pMS38 and pSS38), were analyzed by digestion with re- 
striction enzymes, showing that together they contained enough sequence to 
represent a full-length copy of mRNA7 (about 1800 bases“) 
pSS38 was about 1700 base pairs and was therefore sufficient to account for 
most of mRNA7. The insert in pMS38 contained 830 base pairs. Mapping of both 
inserts, by partial digestion with restriction endonucleases, showed that 
the insert in pMS38 contained an extra 40 base pairs at one end. 
Confirmation that the cDNA insert in pSS38 represented the body of mRNA7 


. The insert in 
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Diagram showing those restriction endonuclease sites, in the DNA copy of 
MHV-JHM mRNA 7, used for subcloning into M13 vectors. Solid arrows show the 
direction and extent of sequence obtained from each clone. Broken lines 
indicate the probable extent of each clone. Restriction endonuclease clea- 
‘vage sites are denoted by the symbols: ? HaelII, a MspI, ® Ball, ¥ Alul, 
A Sau3a and © PvuII. The box at the 3' terminus represents the poly(A) 
tract. 


was obtained by hybrid-arrested translation (Fig. 1). When hybridized to 
cytoplasmic, polyadenylated RNA from cells infected with JHM, the insert 
specifically inhibited the translation of a 60000 mol.wt. polypeptide that 
has been previously identified as the intracellular precursor of the virion 
nucleocapsid protein produced from mna7!>, Melting of the hybrids before 
translation restored the synthesis of nucleocapsid protein. 
Nucteotide sequence of cloned cDNA 

Fragments of both cDNA inserts were generated by various restriction 
enzymes (Fig. 2), cloned into M13 vectors and their sequences were deter- 
mined. The combined sequence of 1767 base pairs (contained between the oligo 
dC/dG tails made during the process of cloning the cDNA) is presented in Fig. 
3. A 5' non-coding sequence of 83 base pairs preceded the first AUG codon, 
which initiated an open reading frame (1365 bases long) with the potential to 
encode a basic polypeptide of 455 amino acids (49700 mol.wt.). The first 
putative termination codon in this reading frame was found at nucleotide 1449 
and was followed by 301 base pairs of 3‘ non-coding sequence. A polyadenylate 
tract was found beginning at nucleotide 1753. The only other open-reading 
frame of greater than 100 bases was from nucleotides 361-771 (inclusive), 
which is potentially able to encode 137 amino acids. Within the 5' non-coding 
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AAA CTT GGA 


Ser Lys Leu 
TCT AAA TTA 


Ser Gly Ala 
TCA GGT GCA 


Ala 
6cc 


Tyr Gin 
TAC CAS 


Ala Gln Lys 
6CT CAG AAA 


Thr Pro 
ALC CCT 


Leu 
TTA 
Ser Asn Val 
TCT 


6ly 
666 


Ala 
6CT 


61In 
CAA 


Phe 


Cys 
Tee 


Thr 
ACT 


Glu 
GAA 


val 
GTT 


Asn 
AAT 


Asp 
GAT 


Glu 
GAG 


Gln 
CAA 


Asp 
GAC 


Pro 
ccc 


Pro 
CCA 


Ser 
AGT 


Leu 
TT6 


Arg 
ASA 


Gin 
CAA 


Glu 


Glu 
GAA 


Asn Ala Gly Ser 
AAT GCC GGT AGC 


Thr Glu 
ACC GAG 


GIn 
CAA 


Arg Gly 
C&C 666 


Ser val 
AST GTG 


Ser 6ly 
TCC 666 


Asn 
AAT 


Ala Gin 
GCA CAA 


Arg Arg 
CGA C&T 


Gly Gln 
GGA CAA 


Ser Phe 
TCC TIT 


Gly Pro 
GG CCC 


Tyr Ala 
TAT GCT 


Ser Ala 
TCT 6CC 


Arg Thr 
AGE ACC 


Pro Gin 
CCT CAA 


Val Leu 
GTA TTG 


Ser Arg 
TCC C&T 


Gly Pro 
666 CCA 


Met 
ATG 


Ala Glu 
GCC GAA 


Glu Ite 
GAA ATT 


Gln 
CAA 


Ser Ala 
AGT GCC 


Gln Gin 
CAG CAG 


Pro G1n Phe Pro 
CCA CAG TIC CCC 


Lys Lys Asn Ser 
AAA AAG MAC TCT 


Phe Asp Ser Thr Leu 
TTT GAT AGT ACT CTA 


Lys Glu 
AAA GAA 


Val 
6T6 


Cys Phe 
T6T TIT 


Asp 
GAT 


val 
GTC 


Asp Gly Gly Ala Asp 
GAT 66T GGT GCA GAT 


Val Asp Asn Val Ser 


Ser 
Tcc 


Arg Ser 
AGA ASC 


Ser Gly 
TCT 66A 
Asn Gln 
AAT 


Asn 
AAT 


Leu Asn 
TTA AAT 


His 
CAT 


val 
61T 


Pro 
ccc 


Yal 
GT6 


Pro 
ccT 


Gly 
GGA 
Thr Pro 
ccT 


Lys 
AAA 


Gly 
GEC 


Ile 
ATT 


Asp 
GAT 


Phe 
TWH 


6ly 
G6T 


Asn 
MAT 


Asn ar 

AAT cee 
teu 
cTT 


Ala 
GCT 


Ala 
6CT 


val 
GTC 


Arg 61n 
AGG CAG 


Gly Lys Arg 
GGA AAG AGA 


Ile Leu Ala 
ATT CTT GCA 


Gly Gly Ala Asp Gly 
GGT G6T GCT GAT GGA 


Pro Gly Phe 6lu Thr 
CCT GGT TTT GAG ACT 


Yal Val Ser Pro Lys 
6TA GTG AGC CCT AAG 


Val Ala Lys Pro Lys 


Asn 
AAL 


arg 
AGA 


Trp 
T6G 


Asn 
AAT 


Gln 
CAG 


Ile 
ATC 


Pro 
ccT 


Ser 


GAA GTA GAT AAT GTA AGC G6TT GCA AAG CCC AAA AGC 


Asp 


Arg Ser Leu Leu Ala Gln Ile Leu Asp Asp Gly 
GAT C&C AGC CTT CT& GCT CAG ATC CTA GAT GAT G6C GTA 


Arg 
cec 


Gly 
GEC 


Phe 
TTT 


Gln 
Pro 
CCA 


Thr 
ACC 


Met 
ATG 


61n 
CAG 


Ser 
TCT 


Val 


Ala 
GCT 


Gly 
Get 


Arg Lys 


Leu 
cTc 


6ly 
66T 


Pro 
ccT 


Phe 
mT 


Lys 
AAG 


Asn 
AAT 


Thr 
ACA 


Ala 
GCT 


Asp 
GAT 


val 
6T6 


Lys 
AAA 


Lys 
AMA 


Arg Lys 
AGA AAS 


val 
6T6 


val 
GTG 


61n 
CAG 


Pro 
CCA 


Tle 
ATC 


Asn 
AAT 


Gly 
GGA 


Leu (25) 
crc 
Gin Lys (50) 
CAG CCC AAS 


Asn 
AAT 


Ile 
ATT 


Thr Phe (75) 


THe 


Ala 


GIn (100) 
&CT CAA 
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CGA AAT GTA AGT 


Asp 
GAT 
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AAT GTG TAAAGAGAATGAATCCTATGTCGGCACTCG6T GG TAACCCCTCECGAGAAAGTCGGGATAGGACACTCTCTATCAGAAT 


(1528) GGATETCTTGCTGTCATAACAGATAGAGAASGTIGTGGCAGACCCTGTATCAATTAGT TGAAAGAGATT GCAAAAT AGAGAATGTGTGAGAGAAGT TAG 


(1627) CAAGGTCCTACETCTAACCATAAGAACGGCGATAGGCGCCCCCTGGGAAGAGCTCACATCAGESTACTATTCCTGCAATGCCCTASTAAATGAATGAAG 


(1726) TTGATCATGGCCAATTGGAAGAATCACAAAAAAAAAAAAAAA 


Fig. 3 


Complete nucleotide sequence of the DNA copy of MHV-JHM mRNA7 (1767 nucleo- 
tides), including a 15 base long, terminal poly (A) tract. The predicted 
sequence of the encoded protein is also depicted. 


sequence, two large, RNase T)-resistant oligonucleotides were found (at 
positions 26-50 and 54-82), the compositions of which were very similar to 


those of the MHV-A59 oligonucleotides 10 and 19 


8 


» respectively. Strikingly, 


the larger oligonucleotide contained a triple repeat of the pentamer AATCT 


and this sequence was found within both, independently generated clones. 
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DISCUSSION 

Analysis of the sequence of MHV genome RNA and subgenomic mRNAs is a way 
of predicting the primary structure of encoded polypeptides and, at the same 
time, identifying non-coding sequences which may be relevant to the regu- 
lation of the viral genes. A comparative analysis of the sequences of dif- 
ferent MHV strains, or mutants with altered phenotypes, will also be needed 
as a basis for further studies on MHV replication and pathogenicity. 

Three criteria define the cloned sequences we have described here as re- 
presenting the MHV-JHM nucleocapsid gene. Firstly, the 3' terminal location 
of this sequence on the genome (unpublished data) is consistent with the 
known gene order of MHV-JHMEE | Secondly, the cloned DNA has the ability to 
specifically arrest the translation in vitro of mRNA7 and, thirdly, the 
characteristics of the predicted polypeptide as a basic protein of 49700 
mol.wt. are consistent with the electrophoretic properties of the nucleocap- 
sid protein in two-dimensional, non- equilibrium, pH gradient, gel electro- 
phoresis“. We believe that the cDNA sequence represents a complete copy of 
MRNA7 because it extends from a polyadenylate tract to a 5' terminal sequence 
UAUAAG, which is very similar to the cap-NUAAG sequence found by direct 
sequencing at the 5' terminus of mRNAS of the related MHV-A59°, 

Examination of the sequence of MHV-JHM mRNA7 reveals several interest- 
ing features. The presence of the two large RNase T,-resistant oligonucleo- 
tides (with similar base compositions to oligonucleotides 10 and 19 of A59°) 
within the 5' non-coding sequence supports the proposal that MHV subgenomic 
mRNAS contain a leader sequence’ *8, The A59 oligonucleotide 10 (equivalent 
to the JHM oligonucleotide at position 26 - 50) is found in all subgenomic 
mRNAs and in genomic RNA. It is not, however, found in 3' terminal fragments 
of the genome corresponding to mRNA7 and so must be nearer the 5' end of the 
genome. A59 oligonucleotide 19 (equivalent to the JHM oligonucleotide at 
position 54 - 82) is found only in mRNA7, but not in the equivalent region of 
the genome, and related oligonucleotides are found in other mRNAs. It has 
therefore been suggested that oligonucleotide 10 is contained within a 
leader sequence’ derived from the 5' end of the genome and that oligonucleo- 
tide 19 is formed by fusion of the leader to the body of mRna7? ?8 The 
relative positions of the equivalent JHM oligonucleotides are consistent 
with this proposal. It is interesting to note that the JHM equivalent of 
oligonucleotide 19 contained the triple repeat of the pentamer AATCT. It is 
likely that such a striking feature has a significant functional role in the 
production of subgenomic mRNAs. MHV is known to replicate in the absence of 


5051 


SIOZ ‘OZ YOUR] UO AresQry [ostig Jo AyIsIOATUL 3e /s1O'sTeuINO[psojxo'IeU//:dyyYy Wor popeopumog 


Nucleic Acids Research 

the host cell nucleus?? »28_ It has also been shown that the target size for 
the UV inactivation of the synthesis of each MHV mRNA is identical to the 
physical size of the mRNAC?, Thus the mRNAs cannot be produced by splicing of 
larger precursors. It has been suggested” that the mRNAs could be produced by 
extension of a virus-encoded RNA primer or by a previously undescribed, 
unusually-specific polymerase jumping mechanism. 

From the sequence alone, it is difficult to speculate about the impor- 
tance of specific features of the nucleocapsid protein structure, at least 
until more is known about the interactions between nucleocapsid protein and 
genome RNA and between the ribonucleoprotein complex and other virion pro- 
teins. Particularly intriguing is the specific interaction of nucleocapsid 
protein with genome RNA but not with subgenomic mRNAs. The MHV-JHM nucleocap- 
sid protein does not contain the clusters of lysine residues seen in the 


N-terminal portion of the capsid protein of Semliki Forest Virus22 so, in 


this respect, it is similar to the nucleocapsid protein of influenza virus?!, 
Some regions enriched in basic amino acids are however apparent (e.g. arg-43 
to lys-50, Tys-101 to lys-113, arg-196 to lys-230, arg-264 to arg-290 and 
lys-392 to lys-405) and, as in the nucleocapsid protein of Semliki Forest 
Virus, the carboxy terminus is acidic. The region ser-194 to ser-220 contains 
9 serine residues, or 24 % of the total serine content within 6 % of the 
coding sequence. Moreover, this is a basic region of the protein. Another 4 
serines are found in the five residues from ser-12 to ser-16. This clustering 
of serine residues may be significant in view of the fact that JHM nucleocap- 
sid protein is phosphorylated specifically at serine residues*-. 

The sequence of MHV-JHM mRNA7 can also be compared with the recently 
published sequence of the nucleocapsid gene of muv-a5gl4 The major dif- 
ferences between the JHM coding sequence reported here and the sequence 
reported for mHy-A5924 are that the JHM sequence contains an additional base 
(nucleotide 408) and lacks a base (after nucleotide 725). However, it is now 
clear (J. Armstrong, personal communication) that the A59 sequence should be 
identical to the JHM sequence at these positions. Therefore, the two sequen- 
ces are very similar, with a 94 % overall homology within the coding 
sequence. This finding is fn accord with the degree of homology based on 
hybridization kinetics of cDNA, copied from MHV-A59 mRNA7, with mRNA from 
cells infected with MHV-JHM@° 
overall, it is not constant throughout the length of the coding sequence. 
Between nucleotides 497 and 569 the homology falls to 63 % and in a sequence 
of 23 bases (nucleotides 1271-1293) near the 3' end of the coding sequence, 


- Although the sequence homology is so high 
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only 9 bases are common to both strains. Also, an extra glutamine codon is 
found in the JHM sequence at nucleotide 1227. : 

The non-coding regions of mRNA? from MHY-JHM and those reported for 
MHv-as924 may also be compared. The 5' non-coding sequences are identical for 
the 17 nucleotides immediately before the AUG codon but the rest of the JHM 
5' non-coding sequence is different from the sequence reported for asgl4 
More recent studies on the sequence of A59 subgenomic messengers (J. Arm- 
strong, personal communication) have shown that the reported 5' non-coding 
sequence!4 is not derived from mRNA7 and therefore further comparison cannot 
be made. When the 3' non-coding sequences are compared, they are found to be 
highly conserved (98 % homology). We believe that this conservation of se- 
quence at the termini of MHV RNA is significant and is most likely related to 
the interaction of these molecules with RNA polymerase during the synthesis 
of negative stranded template as well as during the production of subgenomic 
mRNAs and the replication of genome RNA. 

In conclusion, this comparison reveals that, as might have been expect- 
ed, the sequence encoding the nucleocapsid protein of MHV is relatively 
conserved, even between strains of MHV that show differences in pathogeni- 
city. Our immediate aim is to determine the level of sequence homology, 
between the same strains, for genes more directly involved in the interaction 
between virus and the host cells. 
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